Informed Exploration: A Satisficing Approach to Q-learning

نویسندگان

  • Michael A. Goodrich
  • Todd S. Peterson
چکیده

In the design of robots and automated systems, it is often desirable to endow decision-making agents with an ability to perform self-governed learning. Since many environments produce situations that have not been anticipated, even by the most clever designers, an agent must be endowed with a flexibility to explore options and decide issues that extend beyond rote execution of designer commands. With this research vision as the backdrop, we present a modification to traditional Q-learning that allows a designer to optimally restrict the set of feasible exploration strategies while simultaneously permitting an agent to freely explore within these restrictions. This approach uses a rule of satisficing decision-making that balances the tradeoff between exploiting current knowledge and exploring possible alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds

In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov m...

متن کامل

Type-Based Exploration with Multiple Search Queues for Satisficing Planning

Utilizing multiple queues in Greedy Best-First Search (GBFS) has been proven to be a very effective approach to satisficing planning. Successful techniques include extra queues based on Helpful Actions (or Preferred Operators), as well as using Multiple Heuristics. One weakness of all standard GBFS algorithms is their lack of exploration. All queues used in these methods work as priority queues...

متن کامل

Use of Evidence-Informed Deliberative Processes – Learning by Doing; Comment on “Use of Evidence-informed Deliberative Processes by Health Technology Assessment Agencies Around the Globe”

The article by Oortwijn, Jansen, and Baltussen (OJB) is much more important than it appears because, in the absence of any good general theory of “evidence-informed deliberative processes” (EDP) and limited evidence of how they might be shaped and work in institutionalising health technology assessment (HTA), the best approach seems to be to accumulate the experience of...

متن کامل

Using Satisficing Game Theory for Performance Evaluation of Banks’ Branches (Case Study in the Mellat Bank)

Due to its role in the identification of inefficient branches and deciding the consistency of their activities, evaluating the performance of a bankchr('39')s branches is one of the most important decisions in the field of development and regulation of branch network. In this paper, the satisfactory functions based on game theory strategies have been utilized in order to evaluate the individual...

متن کامل

Consistent exploration improves convergence of reinforcement learning on POMDPs

This paper sets out the concept of consistent exploration of observation-action pairs. We present a new temporal difference algorithm, CEQ(λ), based on this concept and demonstrate using a randomly generated set of partially observable Markov decision processes (POMDPs) that it outperforms SARSA(λ). This result should generalise to any POMDP where satisficing policies which map observations to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999